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@ Hybrid perceptual audio coding. 



© A hybrid coding technique for high quality coding of audio signals, using a subband filtering 
technique further refined to achieve a large number of subbands. Noise masking thresholds for 
subbands are then determined using a new tonality measure applicable to individual frequency bands 
or single frequencies. Based on the thresholds so determined, input signals are coded to achieve high 
quality at reduced bit rates. 
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HYBRID PERCEPTUAL AUDIO CODING 



Field of the Invention 

The present invention relates to coding of time varying signals, such as audio signals representing voice 
or music information. 

Background of the Invention 

in recent years several advanced bit rate reduction algorithms for high quality digital audio have been pro- 
posed (see e.g.. Schroeder. E.F. & Voessing. W.: "High quality digital audio encoding with 3.0 bits/sample using 
adaptive transform coding - 80th. AES-convention Montreaux 1986. Preprint 2321 (B2); TheOe . G.& Link. M 
& Stall. 6.: "Low bit rate coding of high quality audio signals." 82nd AES convention. London 1987. Preprint 
2432 (C-1V Brandenburg. K.: "OCF - A new coding algorithm for high quality sound signals, in Proc.of the 
1987 Int Conf. on AcousL. Speech and Signal Proc. ICASSP 1987. pp. 141-144; and Johnston. J.D.: Trans- 
foim Coding of Audio Signals Using Perceptual Noise Criteria." IEEE Journal on Selected Areas in Communi- 
cations. Vol. 6 (1988). pp. 314-323). Nearly transparent quality can be achieved at bit rates down to B4WMW. 
using frequency domain approaches (see e.g.. Brandenburg. K. &. Seitzer, D.: "OCF: Cod '"9 High Quality Audio 
with Data Rates of 64 kbi/sec." 85th AES convention. Los Angeles 1 988; Johnston, J.D.: ^erraptoal Transform 
Coding of Wideband Stereo signals." pp. 1993- 1996. ICASSP 1989; and Theile. G. & StoH G. & Unk, M.: Low 
bit-rate coding of high quality audio signal. An introduction to the MASCAM system." EBU Review- Technical. 

No. 230 (August 1988). pp. 71-94). . 1fl1 

FIG 1 shows the basic block diagram common to all perceptual frequency domain coders. Afilterband 101 
is used to decompose the input signal into subsampled spectral components. The subsainpled spec^l com- 
ponents are then used to calculate an estimate of the actual (time dependent)) masking threshold in block J02 
using rules known from psychoacoustics (see e.g.. Zwicker. E.: "Psychoakustik" (in German). Berlin Heidelberg 
New York 1982; Hellman. R. P.: "Asymmetry of masking between noise and tone^ Perception and 
Pyschophysics." Vol. 11. pp. 241-246. 1972; and Scharf. B: Chapter 5 ofjoundatgps ,of A^gry 
Theorv. New York. Academic Press. 1970). The spectral components are then quantized and coded in block 
loi^Mithe aim of keeping the noise, which is introduced by quantizing, below the masking threshold. Depend- 
ing on the algorithm this step is done in very different ways, from simple block companding to analysis by synth- 
esis systems using additional noiseless compression. 

Finally, a multiplexer 104 Is used to assemble the bitstream. which typically consists of the quantized and 
coded spectral coefficients and some side information, e. g. bit anocation information. 

There are two interbank designs commonly used in the above arrangement One type is the so-called tree- 
structured filterbank (see e.g. QMF interbank; descrided in Jayant, N. S. & Noll. P.: Diqital Codmfl Of V^veforms: 
Principles and Applications to Speech and Video. Englewood Cliffs 1984) which are designed with the fitter 
bandwidth'ofthe individual bands set according to the critical bands as known from psychoacoustics. Also 
SowTare those filter banks used in transform coders (see e.g.. Jayant, N. S. & Noll. P.: above, and Zelmsk. 
R. & Noll. P.. "Adaptive Transform Coding of Speech Signals." IEEE Trans, on Acoustics. Speech and Signal 
Processing ASSP-25 (1977). pp. 299-309) which use a windowed transform to implement a filter bankwith 
equal bandwidth filters with low computational complexity. Transform coders typically calculate 128 to 1024 
spectral components, which also can be grouped by critical bands. o.^^inn 
The basic problem of the design of an analysis/synthesis system for use in high quality digital audio coding 
IS the trade-off between time domain and frequency domain behavior. If more spectral components are used, 
the masking functions can be estimated with better accuracy. In addition, a higher decollation of thespectral 
components, and therefore a higher coding gain, can be achieved. On the other hand, a higher spectral resol- 
utton necessitates less time resolution, which leads to problems with preecnoes (see e.g.. Vaupelt. Th.: Ein 
Kompander zur Unterdmeckung von hoerbaren Stoerungen bei dynamJschen Skjnalpassagen foer ein Trans- 
fer- mationscodierungsverfahren fuer qualitative hochwertge Audtoslgnale (MSC) (in ® er ™"2; 'T? 
Fachbericht 106. pp. 209-216; and Brandenburg. K.: "High quality sound coding at 2.5 bi(sample. 84th AES 
Convention. Paris 1988. Preprint 2582) and longer processing delay. 

t 

Summary of the Invention 

55 The present invention provides structure and methods which seek to overcome the limitations of theprior 
art through a closer match to the processing of audio signals by the human ear. More specifically, the present 



5MSOOC1D- <6P 0446037A2J_> 




EP 0 446 037 A2 



invention models the ear as a filterbank, but with differing time and frequency resolution at different frequencies. 
Thus the present invention provides an analysis framework that achieves a better fit to the human ear. 

The hybrid coder of the present invention, in typical embodiment, uses a quadrature mirror filter to perform 
an initial separation of input audio signals into appropriate frequency bands. This filtered output is again filtered 
5 using a windowed transform to achieve the effect of a computationally effective filter bank with many channels. 

Masking thresholds for the filtered signals are then determined using a "superblock" technique. As in earlier 
work by the present inventors, a "tonality" measure is used in actually developing appropriate masking 
thresholds. In the present invention, however, an improved tonality measure that is local to critical bands, or 
even a single spectral line, is used. Advantageously, well known OCF coding and quantization techniques are 
10 then used to further process the perceptually coded signals for transmission or storage. 

Brief Description of the Drawing 

FIG. 1 shows a general block diagram of a perceptual coder. 
is FIG. 2 shows a basic analysis system used in the hybrid coder of the present invention in the context of a 

system of the type of the type shown in FIG. 1. 

FIG. 3 shows a time/frequency breakdown of the hybrid analysis structure of FIG. 2. 
FIG. 4 shows a short time spectrum of a test signal. 

FIG. 5 shows a block diagram of the iteration loops of a typical implementation of the present invention. 

20 

Detailed Description 

THE NEW ANALYSIS/SYNTHESIS FILTERBANK 

25 The hybrid coder in accordance with an illustrative embodiment of the present invention uses a hybrid 

QMF/Transform filterbank. FIG. 2 shows the basic analysis/synthesis system. The time domain values are first 
filtered by a conventional QMF-tree filterbank 201-203. This filterbank is used to get 4 channels with 3 to 12 
kHz bandwidth (frequency resolution) and, accordingly, 2 to 8 sample time resolution. The QMF filterbank was 
chosen only because optimized filters were readily available that satisfied our design goals. It proves conve- 

30 nient to use 80-tap QMF filters derived from Johnston, J. D., "A Filter Family Designed for Use in Quadrature 
Mirror Filter Banks, 9 ICASSP 1 980, pp. 291-294). This 80-tap filter is clearly an overdesign; lower computational 
complexity will clearly suffice. 

It is well known that classical QMF-tree filterbanks do not yield "perfect reconstruction" of the input signal. 
However, 80 tap filter illustratively used yields near perfect reconstruction of the analysis/synthesis filter bank 

35 in the sense that the sum of the pass band ripple is below 16 bit resolution. Thus, rounding leads to perfect 
reconstruction. 

The output signals of the QMF-tree are filtered again, this time using a windowed transform to get a com- 
putational effective rater bank 210-213 with many channels. The window used is a sine window, using 50 % 
:) overlap of the analysts blocks. Two different transforms have been used for this purpose. The first transform 

40 that may be used is a classical DFT, which calculates 65 or 129 (lowest frequencies) complex lines. In this 
approach the analysis- synthesis filterbank is not critically sampled. On the other hand, prediction of the com- 
plex frequency lines can be easily used to reduce the data rate further. Alternatively, a modified DCT (MDCT) 
as used in Brandenburg, K.: "Ein Beitrag zu den Verfahren und der Qualitaetsbewteflung fuer hochwertige 
Musikcodierung," (in German), Ph.D. thesis, Universitaet Erlangen-Nuernberg 1989 and described first in Prin- 

45 cen, J. & Johnson, A., Bradley, A.: "Subband / Transform Coding Using Filter Bank Designs Based on Time 
Domain Aliasing Cancellation*, in Proc. of the 1987 Int conf. on Acoustics, Speech and Signal Processing 
ICASSP 87, pp. 2161-2164 may be used. This technique calculates 64 or 128 frequency values per subband 
and is critically sampled. Using this MDCT approach, only half the samples have to be quantized and encoded 
as compared to the DFT solution. 

so The combined filterbank has a frequency resolution of 23.4 Hz at low frequencies and 187.5 Hz at high 

frequencies, with a corresponding difference in time resolution. While the time resolution Is illustratively quan- 
tized to powers of 2, advances in the analysis/synthesis method will provide more range in time/frequency re- 
solution as well as less quantization. Depending on the frequency band, the characteristics of the filter bank 
are similar to an MDCT filterbank of block length 1024 at low frequencies and 128 at high frequencies. Thus, 

55 the frequency resolution at low frequencies Is sufficient for the perceptual model, and the time resolution at 
high frequencies is short enough for pre-echo control without additional algorithmic accommodation. Table 1 
shows time and frequency resolution values for the combined filter bank used in the hybrid coder. 

3 
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Lower bound Upper bound 

in Frequency in Frequency 

Hz Hz 

0.0 3000. 

3000. 6000. 

6000. 12000. 

12000 24000 



Frequency 


Time 


Time 


Resolution 


Resolution 


Resolution 


Hz 


samples 


mS 


23.4 


1024 


21.3 


46.8 


512 


10.7 


93.6 


256 


5.3 


187.2 


128 


2.7 
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Table 1- Time and frequency resolution of the analysis/synthesis filterbank 

in older to code the signal transparently, according to the threshold model. 
CALCULATION OF TONALITY 

masking noise and noise as a masker. See e.g -j^ f L ate a olobal tonality- ofthe short time spec- 

very tonal 0-e.. coherent from transform block to tons ^°*^ v ^ Zed m lead to a very conser- 
correctly for the sensitive (tonal) parts of such a signal, the formula prevtously usea w*i 

The predicted value of r and + at time t are calculated as: 
c(t.f)=r(t-14)*(»(t-1.»Kt-2. f » 

and ^(t,f)=<Kt-l .f>Ht>(t-l .f)-$(t-2.f) 



05 



The Euclidean distance between the actual and predicted values is used to get the new tonality me«ric c(t,f) 
Then, 
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cfcf)= diSt((f (uo^t.oxwuo^t.f))) 
(r(i,fHabs(p(t,f))) 

5 

If the prediction turns out to be very good, c(t,f) will have values near zero. On the other hand, for very 
unpredictable (noisy) signals c(t,f) will have values of up to 1 with a mean of 0.5. This "inverse tonality" or "meas- 
ure of chaos" is converted to a tonality metric by a simple log-linear operation. 

10 t= alnc + p 

The new tonality metric is used to estimate the masking threshold at each spectral component in the same way 
as described in the Johnston paper cited above for the old tonality metric. The program in Listing 1 illustrates 
the processing used to form c(t f f) in the context of a 512 sample input sequence. The program of Listing 1 is 
written in the well-known FORTRAN programming language described, e.g., in Fx/FORTRAN Programmer's 

15 Handbook, Alliant Computer Systems Corp., 1988. The program is intended for use on general purpose com- 
puters marketed by Alliant Computer Systems Corp., but may be readily adapted for use on other general pur- 
pose or special purpose processors. 

In a typical version of the hybrid coder in accordance with the present teachings, the quantization and cod- 
ing scheme of the OCF (Optimum Coding in the Frequency domain), system described in Brandenburg, K. & 

20 SeHzer, D~ "OCF: Coding high Quality Audio with Data Rates of 64 kbit/sec," 85th AES convention, Los Angeles 
1988, has been used. In that analysis-by-synthesis scheme the spectral components are first quantized using 
a nonuniform quantizer. In the inner iteration loop (rate loop) the count of bits needed to code the quantized 
values using an entropy code is compared to the number of available bits. Depending on the ratio of actual 
over available bits the quantization step size is adjusted, leading to a different number of bits needed to code 

25 the block of quantized values. The outer iteration loop (distortion control loop) compares actual quantization 
noise energy for each critical band with the estimated masking threshold. If the actual noise exceeds the mask- 
ing threshold in some critical band, the scale of the spectral components in this critical band is adjusted to yield 
a lower quantization noise. FIG. 5 shows a block diagram of the iteration loops used for quantization and coding. 
The algorithm is described in more detail in the papers by Johnston and Brandenburg and Seitzer, as well as 

30 the Brandenburg thesis, ail cited above. FIG. 5 shows the manner in which a coder such as the OCF system 
uses the psychoacoustic threshold and related information described above to produce the actual bttstream to 
be transmitted or stored. Thus, input information on input 500 is assumed to have been appropriately buffered, 
partitioned into convenient blocks and transformed in the manner described above. The appropriate variable 
resolution spectral information is also provided to block 504 which provides the psychoacoustic evaluation for 

35 weighting frequency signals in block 501 prior to quantization in block 502. The actual entropy coding is rep- 
resented by block 503 in FIG. 5. Thus the information describing the spectral information of the input signals 
is provided on output 515. Side information describing the cycle acoustic evaluation and quantizing processes 
is then supplied on outputs 520 and 525. All outputs are conveniently multiplexed into a single bit stream for 
transmission or storage. 

40 The Perceptual Entropy (see e.g., Johnston, James D. f "Estimation of Perceptual Entropy Using Noise 

Masking Criteria," ICASSP '88, pp. 2524-2527) is an estimate of the information content of a piece of music 
relative to the capabilities of the human auditory system. It gives an estimate of the minimum bit rate necessary 
for total transparent coding of a piece of music using a given analysis/synthesis scheme. As introduced in this 
last-cited paper by Johnston, the PE is calculated from the number of quantization levels necessary to code a 

45 piece of music at the masking threshold. 

Using the analysis/synthesis frame work of the hybrid coder, estimates of the PE have been calculated for 
different pieces of music. Table 2 lists some of the results and compares them to the PE measured using other 
analysis/synthesis systems. It can be seen that the hybrid coder performs well compared to the older results. 

music Old PE New PE 

(type) (bits/sample) (bits/sample) 

organ .24 .48 

suzanne vega .69 .54 

castanets .73 .52 

Table 2: Results of PE measurements 

5 
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1362 bits available for each block at 64 kb/s. 226 b '^ r « " ™ res olution has been described. A 

enhanced processing is accomplished at a decoder. course. 

Information used at a receiver or decoder to ^nstactthe infonriation and side 
that provided asoutputsfr.m^ 

information, after demuJbptexmg if required is used dobal gain, quantizer step size, seal- 
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LISTING 1 

First startup routine 
subroutine sm() 

sets up threshold generation tables, ithr and bval 
real freq(0:25)/0. 9 lOO. v 20O. v 300. t 4OO. v S 10 M 630.,770 M 
1 920 M lO80.,1270 M 148O.,1720.,2OO0. t 232O.^70O., 
1 3150.3700.,440a^30a,6^^ 
1 250007 

common/thresh/ithr(26),bval(257),rnonn(257) 
common/absthr/abslow(257) 
common/sigs/i first 

ithr(i) is bottom of criial band i. bval is bark index 
of each line 

write( V) *whai spl will +-32000 be -> 9 

read(*,*) abslev 

abslev=abslev-96 

abslow=5224245. # 5224245ycxp(9.6*alog(10.)) 



ifirst=0 

writer . # ) 'what ti the sampling rate* 
read(*.*) rzou 

fhyq=rzotx/2 

nyquest frequency of interest. 



ithr(l)=2. 
i=2 

ithr(i)=freq(t- 1 yfnyq*256.+2. 
i=i+l 

if (freq(i-l) .It. fnyq) goto 10 
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C 
C 



sets ithr to bottom of cb 
»thr(r.26)=257 



now. set up the critical band indexing array 
bval(l)=0 

first, figure out frequency, then ... 
doi=2,257,l 



fre=<i-l)/ 256 -* fnyq 
20 r write(*,*) t^te 



freis now the frequency of theUne. convert 
it to critical band number.. 

do 3=0,25.1 

if(fre.gt.trcqO'))^ 
end do 

so now, k = last CB lower than fre 
rpart=fre-freq(k) 

ran gc=freq(k+l)-freq(k) 

bval(i)=k+rpart/range 

end do 

rnorm=l 

do i=2.257,l 

tmp=0 

doj=2,257,l 

tmp «tm I H-sprdngf(bval(3).bval(t)) 
end do 
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morm(i)=tmp 
end do 

monn= 1 yrnorm 
doi=l t 257,l 

write(V) i. bval(i), 10.*alogl0(rnonn(i)) 
end do 

call openas(0//usr^nsrc/thrtry/freqlist' f O) 

doi=2,257.1 

read(0.*) ii.db 

if ( ii .ne. i ) then 

write( V) 'freqlist is bad/ 

stop 

end if 

db=cxp((db-absiev)/10.*alog(10.)) 
write(*,*) i,db 
abslow(i)=abslow(i)*db 

end do 
abslow(l>=l. 

write(*,*) 'lowest level is \ sqrt(absiow(45)) 

return 
end 

Threshold calculation program 
subroutine thrgen(rt,phi,thr) 
realr(257),phi(257) 
rcalrt(257) 
realthr(257) 

common/blnk/ or<257).ophi(257),di<257XdpM(257) 
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C ommoiVblkl/othK257) 
real alpha(257),tK257).tphi(257) 
realbeta(257),bcalc(257) 
5 common/absthr/abslow(257) 

cotnmon/thTeshAihi(26),bval(257).rnorni(257) 
10 common/sigs/ifirs^ 
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r=max(rt,.0005) 
bcalc=l. 



if (ifirst .eq. 0) then 
20 op=0. 

othr=lc20 
ophi=0 
dr=0 
dphi=0 
ifirst=l 
end if 



this subroutine figures out the new threshold values 
using line-by-line measurement. 

tr=or+dr 
tphi=ophi+dphi 



40 dr=r-or 

dphi=phi-ophi 



or=r 

ophi^W 



alpha=sqrt((r*cos(phi)-tr*cos(tphi)) 
» I *(r*cos(phi)-tr*cos(tphi)) 

2 +(r*sin(phi)-tr*sin(tphi)) 



65 
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3 *(r*sin(phi)-tr*sin(tphi))) 
4/(r + abs(tr)+l.) 
bcta=alpha 

c now, beta is the unweighted tonality factor 

alpha=r*r 

c now, the energy is in each 

c line. Must spread. 

c write(*,*) 'before spreading' 

thr=0. 
bcalc=0. 
cvd$l cncall 

doi=2,257,l 

cvd$l cncall 

doj=2,257,l 

glorch=sprdngf(bval(j).hvalG)) 
thr(l>=alphaa)*glorch+thi<i) 
bcalc(i)=alpha(j)*glorch*betaa)+bcalc(i) 
c thr is the spread energy, beale is the weighted chaos 

end do 

c if (thi<i) .eq. 0 ) then 

c write(*,*) *zero threshold,* 

c stop 

c end if 

bcalc(i)=bcalc(i)/thr(i) 

if (bcalc(i) .gt~ .5) bcalc(i)=l,-bcalc(i) 

c that normalizes beale to 0-.5 

end do 

c write(*,*) 'after spreading* 

bcalc=max(bcalc,.05) 
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bcalc=min(bcalc,.5) 

bcalc is now the chaos metric, convert to the 
tonality metric 

bcalc=-.43*alog(bca!c)-.299 
now calculate DB 

bcalc=max(24.5,(l 5.5+bval))*bcalc+5.5*(l .-bcalc) 

bcalc=exp( (-bcalc/10.) * alog (10.) ) 

now, bcalc is actual tonality factor, for power 

space. 

thr=thr*morm*bcalc 

threshold is tonality factor times energy (with normalization) 

thr=max(thr,abslow) 

alpha=thr 

thr=min(thr,othr*2.) 
olhr=alpha 

write(*,*) 'leaving thrgen' 

return 

end 

And, the spreading function 
function sprdngf(j,i) 
rcalij 
real sprdngf 

this calculates the value of the spreading function for 

the i*th bark, with the center being the j'th 

bark 

tcmpl=i-j 

tcmp2-15.8U389 +7.5*(templ+.474) 

temp2=temp2- 17.5*sqrt(l.+ (templ-K474)*(tcmpl+.474) ) 

if ( temp2 .le. -100. ) then 

temp3=^). 

else 
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temp2«temp2/10.*alog( 10.) 

temp3=exp(temp2) 

end if 

5 

sprdngf=temp3 

return 

end 
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Claims 

5 1. A method of processing an ordered time sequence of audio signals partitioned into blocks of samples, said 
method comprising 

determining a discrete short-time spectrum, S(o>i), M , 2.....N, for each of said blocks, 
determining the value of a tonality function as a function of frequency, and 
based on said tonality function, estimating the noise masking threshold for each of ©i, 
10 CHARACTERIZED IN THAT 

said step of determining S<o> ( ) comprises determining S(a>i) with differing time and frequency resol- 
ution as a function of 

2. The method of claim 1 further 

15 CHARACTERIZED IN THAT 

said step of determining S(o>i) comprises determining S(<0|) with frequency and time resolution 
approximating that of human auditory response. 

3. The method of claim 1 further 

20 CHARACTERIZED IN THAT 

predicting, for each block, an estimate of the values for each S(o,) based on the values for S(g>0 for 
one or more prior blocks, 

determining for each frequency o>, a randomness metric based on respective ones of the predicted 
value for S(e»i) and the actual value for S(a>i) for each block, 
25 said method further comprises the step of 

quantizing said S(md based on said noise masking threshold at respective cdj. 

4. The method of claim 3 further 

CHARACTERIZED IN THAT 

30 said step of predicting comprises, 

for each a*, forming the difference between the value of S(©i) for the corresponding ©, from the two 
preceding blocks, and 

adding said difference to the value for S{&d from the immediately preceding block. 

35 5. The method of claim 4, 

. CHARACTERIZED IN THAT 

said S(c»i) is represented in terms of its magnitude and phase, and said difference and adding are 
effected separately for both magnitude and phase of S(g>i). 

40 6. The method of claim 3, 

CHARACTERIZED IN THAT 

said determining of said randomness metric is accomplished by calculating the euclidian distance 
between said estimate of S(coO and said actual value for S(©|). 

45 7. The method of claim 6, 

CHARACTERIZED IN THAT 

said determining of said randomness metric further comprises normalizing said euclidian distance 
with respect to the sum of the magnitude of said actual magnitude for S(o>i) and the absolute value of said 
estimate of S(o>i). 

so 

8. The method of claim 1, 

CHARACTERIZED IN THAT 

said estimating of the noise masking threshold at each a* comprises 
determining the energy of said audio signal at o>,, and said method further comprises 
55 spreading said energy values at a given a* to one or more adjacent frequencies, thereby to generate 

a spread energy spectrum, and 

determining a desired noise level at each cd, based on said tonality function and said spread energy 

spectrum for the respective a>,. 

15 
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9. The method of damn 8, wherein said estimating of the noise masking threshold function further comprises 
modifying said threshold function in response to an absolute noise masking threshold for each g>, to form 
a limited threshold function. 

5 10. The method of claim 9, 

CHARACTERIZED IN THAT 

said method further comprising modifying said limited threshold function to eliminate any existing 
pre-echoes, thereby generating an output threshold function value for each 

10 11. The method of any of claims 1, 8, 9 or 10, further 
CHARACTERIZED IN THAT 
said method further comprising the steps of 

generating an estimate of the number of bits necessary to encode S(to,) 

quantizing said S(o>,) to form quantized representations of said S(a>i) using said estimate of the nunv 
is ber of bits, and 

providing to a medium a coded representation of said quantized values and information describing 
about how said quantized values were derived. 

12. A method for decoding an ordered sequence of coded signals comprising first code signals representing 
20 values of the frequency components corresponding to a block of values of an audio signal and second code 

signals representing information about how said first signals were derived to represent said audio signal 
with reduced perceptual error, said method comprising 

using said second signals to determine quantizing levels for said audio signal which reflect a reduced 
level of perceptual distortion, 
25 reconstructing quantized values for said frequency content of said audio signal In accordance with 

said quantizing levels, and 

transforming said reconstructed quantized spectrum to recover an estimate of the audio signal, 

CHARACTERIZED IN THAT 

said frequency components have variable rime and frequency resolution. 

30 

13. The method of claim 12 

CHARACTERIZED IN THAT 

said second signals identify the variation of said resolution as a function of frequency, and 

said reconstructing comprises using said second signals to effect scaling of said quantized values. 

35 

14. The method of claim 12 

CHARACTERIZED IN THAT 

said reconstructing comprises applying a global gain factor based on said second signals. 

40 15. The method of daim 12 

CHARACTERIZED IN THAT 

said reconstructing comprises determining quantizer step size as a function of frequency compo- 
nent 

45 16. The method of daim 12 

CHARACTERIZED IN THAT 

said second signals Indude information about the degree of coarseness of quantization as a function 
of frequency component 

so 17. The method of daim 12 

CHARACTERIZED IN THAT 

said second signals indude information about the number of values of said audio signal that occur 
in each block. 

55 18. Apparatus for processing an ordered time sequence of audio signals partitioned into blocks of samples 
comprising, 

means for determining a discrete short-time spectrum, S<o>0, 1*1 , 2,...,N, for each of said blocks, 
means for determining the value of a tonality function as a function of frequency, and 
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means for estimating the noise masking threshold for each of a> u in response to said tonality function, 
said means for determining further comprises 

means for determining S(a>i) with dithering time and frequency resolution at different values of©,. 



5 19. The method of claim 1 further 

CHARACTERIZED IN THAT 

said means for determining S(cot) comprises determining S(o>i) with frequency and time resolution 
approximating that of human auditory response. 

10 20. The apparatus of claim 18 



CHARACTERIZED IN THAT 

said means for determining S(a>]) comprises 

means for partitioning said audio signal into a plurality of frequency subband, and 
means for determining the short-time spectrum for each subband. 



15 

21. The apparatus of claim 20 

CHARACTERIZED IN THAT 



said means for partitioning comprises quadrature mirror filter means. 



20 22. The apparatus of claim 21 

CHARACTERIZED IN THAT 



said quadrature mirror filter means comprises a tree structural array of quadrature mirror filters. 



30 
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FIG. 4 
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